Dealing with structural patterns of XML documents
نویسندگان
چکیده
Evaluating collections of XML documents without paying attention to the schema they were written in may give interesting insights about the expected characteristics of a markup language, as well as bout any regularity that may span across vocabularies and languages, and that are more fundamental and frequent than plain content models. In this paper we explore the idea of structural patterns in XML vocabularies, by examining the characteristics of elements as they are used, rather than as they are defined. We introduce from the ground up a formal theory of eight plus three structural patterns for XML elements, and verify their identifiability in a number of different XML vocabularies. The satisfying results have allowed the creation of visualization and content extraction tools that are completely independent of the schema and without any previous knowledge of the semantics and organization of the XML vocabulary of the documents.
منابع مشابه
خوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملNATIVE XML DATABASES vs. RELATIONAL DATABASES IN DEALING WITH XML DOCUMENTS
When dealing with data-centric XML documents, it is possible to convert XML documents into a relational database, which can then be queried using SQL. Such relational databases are called XML-enabled databases. On the other hand, the best choice for storing, updating and retrieving document-centric XML documents is usually a native XML database (NXD). NXDs store XML documents as logical units, ...
متن کاملStructural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory
This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited. Actual documents, therefore, may be used to derive classes of elements (patterns) persisting acro...
متن کاملMining Frequently Changing Substructures from Historical Unordered XML Documents
Recently, there is an increasing research efforts in XML data mining. These efforts largely assumed that XML documents are static. However, in many real applications, XML data are evolutionary in nature. In this paper, we focus on mining evolution patterns from historical XML documents. Specifically, we propose a novel approach to discover frequently changing structures (FCS) from a sequence of...
متن کاملPrototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JASIST
دوره 65 شماره
صفحات -
تاریخ انتشار 2014